“How have the ratings of Game of Thrones episodes evolved over time across different seasons?”
This research question will aim to explore trends in the data such as whether ratings increased or decreased over the course of the show, or whether there were any significant drops after key plot developments, for example.
The raw data was obtained from the IMDB website which is publicly accessible via this link:
https://www.imdb.com/title/tt0944947/ratings/
The data used in this analysis was extracted from the ‘Ratings by episode’ section.
library(tidyverse) #for data manipulation, visualization, and analysis
library(here) #simplifies paths
library(readxl) #to read excel files in R
library(knitr) #runs the R code and inserts the results back into the document
library(dplyr) #for data manipulation tasks
library(ggimage) #to add images to the legend
library(jpeg) #to read jpeg files
library(gganimate) #to create an animated plot
#load data from excel file
rawdata <- read_excel(here::here("raw_data", "raw_data.xlsx"))
## New names:
## • `` -> `...1`
R has automatically assigned any empty values in the table to now say “…1”. First, I am going to print the data to see how it looks initially after importing.
#this is a sanity check to inspect the data
print(rawdata)
## # A tibble: 8 × 12
## ...1 e1 e2 e3 e4 e5 e6 e7 e8 e9 e10 e11
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 s1 8.9 8.6 8.5 8.6 9 9.1 9.1 8.9 9.6 9.4 9.2
## 2 s2 8.6 8.3 8.7 8.6 8.6 8.9 8.8 8.6 9.7 9.3 NA
## 3 s3 8.6 8.4 8.7 9.5 8.9 8.7 8.6 8.9 9.9 9.1 NA
## 4 s4 9 9.7 8.7 8.7 8.6 9.7 9 9.7 9.6 9.7 NA
## 5 s5 8.3 8.3 8.3 8.5 8.5 7.9 8.8 9.8 9.4 9.1 NA
## 6 s6 8.4 9.2 8.6 9 9.7 8.3 8.5 8.3 9.9 9.9 NA
## 7 s7 8.5 8.8 9.1 9.7 8.7 9 9.4 NA NA NA NA
## 8 s8 7.6 7.9 7.5 5.5 5.9 4 NA NA NA NA NA
My data has successfully imported, the next step is to wrangle the data to convert it into a form that can be easily visualised.
#rename the first column after automatic assignment of "...1"
colnames(rawdata) <- ifelse(colnames(rawdata) == "...1", "Season", colnames(rawdata))
#rename the columns, excluding the first which I have just renamed to remain empty
colnames(rawdata)[-1] <- c("Episode 1", "Episode 2", "Episode 3", "Episode 4", "Episode 5", "Episode 6", "Episode 7", "Episode 8", "Episode 9", "Episode 10", "Episode 11")
#this is a sanity to check to make sure the column headers changed
head(rawdata, n = 1)
## # A tibble: 1 × 12
## Season `Episode 1` `Episode 2` `Episode 3` `Episode 4` `Episode 5` `Episode 6`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 s1 8.9 8.6 8.5 8.6 9 9.1
## # ℹ 5 more variables: `Episode 7` <dbl>, `Episode 8` <dbl>, `Episode 9` <dbl>,
## # `Episode 10` <dbl>, `Episode 11` <dbl>
#change the values in the first column
#removing the 's' just to clean up to view of the table
rawdata$Season <- sub("s", "", rawdata$Season)
#render the table with kable
kable(rawdata, format = "markdown")
| Season | Episode 1 | Episode 2 | Episode 3 | Episode 4 | Episode 5 | Episode 6 | Episode 7 | Episode 8 | Episode 9 | Episode 10 | Episode 11 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 8.9 | 8.6 | 8.5 | 8.6 | 9.0 | 9.1 | 9.1 | 8.9 | 9.6 | 9.4 | 9.2 |
| 2 | 8.6 | 8.3 | 8.7 | 8.6 | 8.6 | 8.9 | 8.8 | 8.6 | 9.7 | 9.3 | NA |
| 3 | 8.6 | 8.4 | 8.7 | 9.5 | 8.9 | 8.7 | 8.6 | 8.9 | 9.9 | 9.1 | NA |
| 4 | 9.0 | 9.7 | 8.7 | 8.7 | 8.6 | 9.7 | 9.0 | 9.7 | 9.6 | 9.7 | NA |
| 5 | 8.3 | 8.3 | 8.3 | 8.5 | 8.5 | 7.9 | 8.8 | 9.8 | 9.4 | 9.1 | NA |
| 6 | 8.4 | 9.2 | 8.6 | 9.0 | 9.7 | 8.3 | 8.5 | 8.3 | 9.9 | 9.9 | NA |
| 7 | 8.5 | 8.8 | 9.1 | 9.7 | 8.7 | 9.0 | 9.4 | NA | NA | NA | NA |
| 8 | 7.6 | 7.9 | 7.5 | 5.5 | 5.9 | 4.0 | NA | NA | NA | NA | NA |
The above table shows a much cleaner version of the data, however, it is not ready for visualisation yet. Before I take the data and plot it, first I am going to remove the final column containing ‘Episode 11’. The reason for this is that this data is not required in the analysis as this is the unaired original pilot. Audiences never saw this episode and was simply an alternate to the official pilot episode that was released. Therefore, ‘Episode 11’ was excluded from the final dataset.
# Reshape the data to long format for a more flexible structure for visualizing, analyzing, and modeling data. This is easier for ggplot2 to handle.
rawdata_long <- rawdata %>%
pivot_longer(cols = starts_with("Episode"), # Select columns that start with "Episode"
names_to = "Episode", # Create a new column "Episode"
values_to = "Rating") # Create a new column "Rating"
# Exclude Episode 11 from the data
data <- rawdata_long %>%
filter(str_replace(Episode, "Episode ", "") != "11")
#this is a sanity check to make sure the data is now in a long format
head(data)
## # A tibble: 6 × 3
## Season Episode Rating
## <chr> <chr> <dbl>
## 1 1 Episode 1 8.9
## 2 1 Episode 2 8.6
## 3 1 Episode 3 8.5
## 4 1 Episode 4 8.6
## 5 1 Episode 5 9
## 6 1 Episode 6 9.1
# Save the data as a CSV file, which can be opened in excel.
write.csv(data, "data/data.csv", row.names = FALSE)
#Create a basic line plot with minimal customisation
p <- ggplot(data, aes(x = as.integer(str_replace(Episode, "Episode ", "")), # Convert episode to numeric
y = Rating,
color = factor(Season))) + # Use Season for different lines
geom_line() + # Draw lines
geom_point() + # Add points for each episode
labs(x = "Episode Number", # Label for x-axis
y = "Episode Rating", # Label for y-axis
color = "Season Number") + # Label for the legend
theme_minimal() + # Use a minimal theme for a clean look
scale_color_viridis_d() + # Add color scale for different lines
theme(legend.position = "right") # Place the legend at the right
#view the plot as a sanity check to assess what direction to take the customisations.
print(p)
Write something here about recoding the episode column to be able to change the scale
# Preprocess the Episode column to extract numeric episode numbers. It removes the "Episode " part of the string and converts the remaining number (e.g., "1", "2") into an integer; creating a new column 'EpisodeNumber'
data1 <- data %>%
mutate(EpisodeNumber = as.integer(str_replace(Episode, "Episode ", "")))
# This is a sanity check to view the new column
print(data1)
## # A tibble: 80 × 4
## Season Episode Rating EpisodeNumber
## <chr> <chr> <dbl> <int>
## 1 1 Episode 1 8.9 1
## 2 1 Episode 2 8.6 2
## 3 1 Episode 3 8.5 3
## 4 1 Episode 4 8.6 4
## 5 1 Episode 5 9 5
## 6 1 Episode 6 9.1 6
## 7 1 Episode 7 9.1 7
## 8 1 Episode 8 8.9 8
## 9 1 Episode 9 9.6 9
## 10 1 Episode 10 9.4 10
## # ℹ 70 more rows
can write something here about customising the colours
# Convert the numeric 'Season' column to a factor with appropriate labels
data1$Season <- factor(data1$Season,
levels = 1:8,
labels = c("Season 1", "Season 2", "Season 3", "Season 4", "Season 5", "Season 6", "Season 7", "Season 8"))
# Assign custom colors to each line based on the season
custom_colors <- c(
"Season 1" = "#7f7f7f", # Grey for Season 1, House Stark
"Season 2" = "#ffc406", # Yellow for Season 2, House Baratheon
"Season 3" = "#006400", # Green for Season 3, House Tyrell
"Season 4" = "#ED7014", # Orange for Season 4, House Martel
"Season 5" = "#B03060", # Maroon for Season 5, House Lannister
"Season 6" = "#023E8A", # Blue for Season 6, House Arryn
"Season 7" = "#000000", # Black for Season 7, House Greyjoy
"Season 8" = "#ff0000" # Red for Season 8, House Targaryen
)
Season 1 : #7f7f7f
Season 2 : #ffc406
Season 3 : #006400
Season 4 : #ED7014
Season 5 : #B03060
Season 6 : #023E8A
Season 7 : #000000
Season 8 : #ff0000
# Create the plot with new customisations
p1 <- ggplot(data1, aes(x = EpisodeNumber, y = Rating, color = factor(Season))) +
geom_line() +
geom_point() +
labs(x = "Episode Number",
y = "Episode Rating",
color = "", # Label for the legend
caption = "Source: IMDB.com") + # Add source text at the bottom
ggtitle("Game of Thrones Episode Ratings Per Season") + # Add a title
theme_minimal() + # Clean, minimal theme
scale_color_manual(values = custom_colors) + # Apply custom colors for lines
scale_x_continuous(breaks = seq(1, max(data1$EpisodeNumber), by = 1)) + # Set x-axis breaks
scale_y_continuous(
breaks = seq(4, 10, by = 0.5), # Set y-axis breaks
limits = c(4, 10), # Set y-axis limits
expand = c(0, 0) # Remove extra padding
) +
theme(legend.position = "right") + # Position the legend on the right
guides(color = guide_legend(
keywidth = 2, # Adjust the size of the legend key (box around the color circle)
keyheight = 2, # Adjust the size of the legend key (box around the color circle)
override.aes = list(size = 5) # Increase the size of the color circles inside the legend
))
# Display the plot
print(p1)
anim <- p1 +
geom_point() +
transition_manual(EpisodeNumber, cumulative = TRUE) +
labs(
subtitle = "Episode: {frame}" # Add a dynamic subtitle that changes with each frame
)
print(anim)
anim2 <- p1 +
geom_point() +
transition_reveal(EpisodeNumber) +
labs(
subtitle = "Episode: {frame_along}" # Add a dynamic subtitle that changes with each frame
)
print(anim2)